Goto

Collaborating Authors

 zero-shot performance


Equilibrium Policy Generalization: AReinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

Neural Information Processing Systems

Equilibrium learning in adversarial games is an important topic widely examined in the fields of game theory and reinforcement learning (RL). Pursuit-evasion game (PEG), as an important class of real-world games from the fields of robotics and security, requires exponential time to be accurately solved. When the underlying graph structure varies, even the state-of-the-art RL methods require recomputation or at least fine-tuning, which can be time-consuming and impair real-time applicability. This paper proposes an Equilibrium Policy Generalization (EPG) framework to effectively learn a generalized policy with robust cross-graph zeroshot performance. In the context of PEGs, our framework is generally applicable to both pursuer and evader sides in both no-exit and multi-exit scenarios.


Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

Neural Information Processing Systems

Equilibrium learning in adversarial games is an important topic widely examined in the fields of game theory and reinforcement learning (RL). Pursuit-evasion game (PEG), as an important class of real-world games from the fields of robotics and security, requires exponential time to be accurately solved. When the underlying graph structure varies, even the state-of-the-art RL methods require recomputation or at least fine-tuning, which can be time-consuming and impair real-time applicability. This paper proposes an Equilibrium Policy Generalization (EPG) framework to effectively learn a generalized policy with robust cross-graph zero-shot performance. In the context of PEGs, our framework is generally applicable to both pursuer and evader sides in both no-exit and multi-exit scenarios.


Efficient Equivariant Transfer Learning from Pretrained Models

Neural Information Processing Systems

Efficient transfer learning algorithms are key to the success of foundation models on diverse downstream tasks even with limited data. Recent works of Basu et al. (2023) and Kaba et al. (2022) propose group averaging (equitune) and optimizationbased methods, respectively, over features from group-transformed inputs to obtain equivariant outputs from non-equivariant neural networks. While Kaba et al. (2022) are only concerned with training from scratch, we find that equitune performs poorly on equivariant zero-shot tasks despite good finetuning results. We hypothesize that this is because pretrained models provide better quality features for certain transformations than others and simply averaging them is deleterious. Hence, we propose λ-equitune that averages the features using importance weights, λs. These weights are learned directly from the data using a small neural network, leading to excellent zero-shot and finetuned results that outperform equitune. Further, we prove that λ-equitune is equivariant and a universal approximator of equivariant functions. Additionally, we show that the method of Kaba et al. (2022) used with appropriate loss functions, which we call equizero, also gives excellent zero-shot and finetuned performance.


Compressing Large Language Models using Low Rank and Low Precision Decomposition

Neural Information Processing Systems

This work introduces $\rm CALDERA$ -- a new post-training LLM compression algorithm that harnesses the inherent low-rank structure of a weight matrix $\mathbf{W}$ by approximating it via a low-rank, low-precision decomposition as $\mathbf{W} \approx \mathbf{Q} + \mathbf{L}\mathbf{R}$. Here, $\mathbf{L}$ and $\mathbf{R}$ are low rank factors, and the entries of $\mathbf{Q}$, $\mathbf{L}$ and $\mathbf{R}$ are quantized. The model is compressed by substituting each layer with its $\mathbf{Q} + \mathbf{L}\mathbf{R}$ decomposition, and the zero-shot performance of the compressed model is evaluated. Additionally, $\mathbf{L}$ and $\mathbf{R}$ are readily amenable to low-rank adaptation, consequently enhancing the zero-shot performance.


A Appendix A.1 UniBench Implementation Details We have developed UniBench

Neural Information Processing Systems

To evaluate new VLMs that expand beyond the already implemented 59 VLMs, users need to follow Code Snippet 2. Users would need to create a class that inherent from As described in Section 2.2, LLM-style models defined as models that generate tokens/text as output. Thereby, making them hard to compare with CLIP-style VLMs. Following Matsuura et al. [2023] methodology, we evaluated Llava 1.5 [Liu et al., 2023] - a LLM-style VLM - on various benchmark types in UniBench (Table 2). Scaling improves many benchmarks, but offers little benefit for reasoning and relation. Figure 8: Benchmark capabilities performance does not scale with dataset and model size Median zero-shot performance of models on various benchmark capabilities.



Checklist 1. For all authors (a)

Neural Information Processing Systems

Do the main claims made in the abstract and introduction accurately reflect the paper's If you ran experiments (e.g. for benchmarks)... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] See A.2 (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [Y es] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) For a detailed description and intended uses, please refer to 1. A.2 Dataset Accessibility We plan to host and maintain this dataset on HuggingFace. A.4 Dataset Examples Example question-answer pairs are provided in Tables 9 10 11, . Example Question "What does the symbol mean in Equation 1?" Answer "The symbol in Equation 1 represents "follows this distribution". "Can you provide more information about what is meant by'generative process in "The generative process refers to Eq. (2), which is a conceptual equation representing Question "How does the DeepMoD method differ from what is written in/after Eq 3?" Answer "We add noise only to Question "How to do the adaptive attack based on Eq.(16)? "By Maximizing the loss in Eq (16) using an iterative method such as PGD on the end-to-end model we attempt to maximize the loss to cause misclassification while Question "How does the proposed method handle the imputed reward?" "The proposed method uses the imputed reward in the second part of Equation 1, "Table 2 is used to provide a comparison of the computational complexity of the "Optimal number of clusters affected by the number of classes or similarity between "The authors have addressed this concern by including a new experiment in Table 4 of Question "Can you clarify the values represented in Table 1?" Answer "The values in Table 1 represent the number of evasions, which shows the attack "The experiments in table 1 do not seem to favor the proposed method much; softmax Can the authors explain why this might be the case?" Answer "The proposed method reduces to empirical risk minimization with a proper loss, and However, the authors hope that addressing concerns about the method's theoretical Question "Does the first row of Table 2 correspond to the offline method?"